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ABSTRACT 

The American Nurses' Association certification 
provides professional recognition beyond licensure to nurses who pass 
an examination. To determine the passing score as it would be set by 
a representative peer group, a survey was mailed to a random sample 
of 200 recently certified nurses. Three questions werii asked: (1) 
what percentage of examinees should pass; (2) what percentage of 
questions should be answered correctly for certification; and (3) 
what score should be achieved for certification, given a hypothetical 
distribution of scores on a 75-item test. There were 98 responses. 
Respondents indicated that a mean of 70.36% should pass. The 
currently used Angoff technique resulted in a passing rate of 87.96%. 
The mean percentage of questions that should be answered correctly 
was calculated to be 71.28% according to the survey, 56% as set by 
committee. The mean score that should be achieved for certification 
was calculated to be 53.81%. In comparison to the current standards 
set by committee, the peer group set higher standards. They also 
showed an unexpected correspondence between percentage correct 
required for passing and the percentage actually passing. Comparison 
models suggested by Hofstee and Beuk, when applied to the survey 
data, indicated that 61-63% of the examinees should pass, with scores 
of 64-67% required to pass. (MGD) 



* Reproductions supplied by EDRS are the best that can be made * 

* from the original document. * 



^RIC 



Congruence of Standard Setting Methods for a Nursing Certification Examination 



Lawrence J. Fabrey and Mark R. Raymond 

Paper presented to the National Courcil on Measurement in Education 
Annual Meeting, April 1987, Washington, DC. 



"PERMISSION TO REPRODUCE THIS 
MATERIAL HAS BEEN GRANTED BY 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC)." 



U.S. DEPARTMENT OF EDUCATION 

Otfice <^ Educational Research and Improvement 

EDUCATIONAL RESOURCES INFORMATION 
CENTER (ERIC) 

OThis document has been reproduced as 
received from the pernor) or organizaiior) 
originating it 

OLMinor changes have been made to improve 
^Reproduction quality- 



• Points of viev^ or opinions stated in thisdocu' 
ment do not necessarily represent official 
OERI positioner policy 



Ooagruenoe of StaxKlard Setting MetbcKSs 
for a Murslng Certification Exasnination 

lasonence J. Faixvey and Mark R. Rayncmd 

Beudcqround 

Certification agencies have increasingly adopted absolute standard setting 
procedures in favor of relative ones. In addition, inethods for reaching a 
occprcxaise between absolxxte and relative procedures have been proposed 
(DeGniijter, 1985) . After classifying standard setting inethods according to 
a judgmental-enpirica^ oontinuum, Berk (1986) provided an evaluation of the 
technical adequacy and practicability of each. In selecting a inethod, Berk 
reccnnnended the use of some form of judgmental analysis (for political 
reasons) , and use of a conc^Aually and cotputationally sinple technique (for 
the sate of credibility) . 

In describing judgmental methods, Livir^ston and Zieky (1982) state that 
standard setting jiadgiaents should be made in a way that accounts for the 
purpose of the test, by qualified persons, for \itim the judgments have 
meaning. Berk (1986) suggested that standard setting issues for educational 
certification test i^)ecialists and licensure/certif ication boards are 
similar, except that with the latter group, sanpling judges trm a variety of 
populations is not necessary. Traditionally, official boards have been 
responsible for certification or licensure standard setting, but arguments 
could be made for involving other gro^ in the process. For exaitple, if a 
certification program is intended for professional recognition, a peer groi^) 
of examinees and/or cartif icants mi^t be most appropriate for stardard 
setting. On the other hand, if a program is primarily intended to protect 
the public, perh^ the public should help determine the requisite level of 
knowledge and skill. 

Die American Nurses' Association (ANA) certification program is intended 
primarily for professional recognition beyond initial licensure. Die purpose 
of this study was to investigate the possible outcomes of asking a represen- 
tative peer group of x&Xint oertif icants to determine the examination passing 
score. Specifica3.1y, this study was designed to assess the degree of 
discrepar*:y between absolute, relative and conpromise standards that would be 
set by ce^ ificants, and the extent to which applicaticxi of the various 
standard t >ing models would approximate tl^ actual passing point that had 
been set by -^ittee for a recent examination. 

Methods 

A one-page survey was maixed in May 1985 to a random saitple of 200 recently 
certified nurses. All respondents had met the same eligibility requirements 
to sit for the examination, had taken the same examination, and had exceeded 
the identical standard. Hie nurses had taken the examination the previous 
October and had received their score rqports approximately four months 
preceding the survey. 

Ihe survey was designed to elicit pero^ions regarding at vAiat point a 
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standard shculd be set by relative or absolute nvethods. Specifically, three 
questions were asked; 

1) What percentage of examinees should pass the national certifying 
examination? 

2) What percentage of questions should an examinee answer correctly in 
order to be certified? 

3) Given the hypothetical distribution of scores on a 75-item test 
shewn on the survey, vfliat score should be achieved in order to be 
certified? 

Selected background information vas provided for each question. For exaitple, for 
the first question, respondents were asked to bear in mind that criteria pertain- 
ing to eligibilily (e.g. licensure, current practice) had been net. In addition, 
respondents were infonoed that "there is no corrKt answer; we are seeking your 
opinion about the proportion of your colleagues that should pass the 
examination." 

For question two, respondents were asked to "bear in mind that it WDUld be vir- 
tually iitpossible to get 100 percent correct, and that one would e55)ect to get 25 
percent correct by random guessing." Biey were to assume that the test questions 
are relevant and of varied difficulty levels. While the percentage correct is a 
sccewhat crude statistic that may tend to perpetuate stereotypical standards, it 
was expected to be Dore understandable to the respondents, particularly since 
percent correct scores were amcaig those provided on the score r^rts that 
respondents had received several roiths previously. 

For question three, the distribution of scores shewn was cotparable to that of a 
recent candidate graip in the certification area being surveyed; the shape was 
identical but the number of examinees and raw score values were changed. In 
addition to presenting the distribution of scores, the minimum, maxiinum, modal 
and mean scores were noted, to help ensui ^ that respondents would understand the 
table. 

Results 

Ei^t weeks follwing the mailing, 98 usable responses were returned on the 
postage paid cards included in the mailing. Because the r^pordents were assured 
of ccBotplete anonymily, no follcw-15) was attertpted. 

Ei^t respondents diose not to answer the first questicai dealing with the percent 
that should pass. Most of these individuals indicated that anyone v^o can 
achieve a certain score should pass. The mean response of the 90 respondents to 
question one was 70.36 percent passiiig, and the standard deviation was 16.43 
(standard error = 1.66) . Responses ranged from 25 to 100 percent, in a 
negatively skewed distribution pea3oed with 14 responses at 75 percent, and 15 
responses at 80 percent. 

The mean percent correct valxae in response to the second question was 71.28, and 
the standard deviation was 9.72 (standard error = 0.98) . Responses ranged from 
40 to 95 percent correct, and the mcxial response was 75 (n = 29) . While a 
neg?tive correlation between percent passing and percent correct would be 
expected, a positive correlation of .21 (p = .04) was found. 

While the first two questions encouraged the respondents to think in relative and 
absolute terms, respectively, the third question presented a distribution of 
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scx>res and prcrvrlded no specific guidance. *Ihe mean response to question three 
(based on n » 96) vas 53.81 (or 71.75 percent of the itats correct) , and the 
standard <feviation was 7.01 (standard error « 0»71) . Responses ranged from 22 to 
70, aixi the distribution peaked at 55 (n « 16) and 56 (n 15) . Ihe correlation 
betweei> the percent correct and number correct re^xsnses vas .49 (p < .001) , 
vftiile t}ie correlation between percent passing and number correct was -.06 (n.s.) . 

Ihe cca pr anise models suggested by Htofstee (1983) and Beuk (1984) were applied to 
the data fran the survey. With the Beuk model, the preferred ccnibinations of a 
(cutoff score) and f (passing percentage) were drawn directly froro responses to 
the first two questions on the survey. With the Hbfstee method, judges are asked 
to provide the miniinum and maximum acceptable cutoff scores, and the miniinum and 
maxiimmi acceptable percentage of failers. Since the survey in this stuiy asked 
for the preferred cutoff and passing rate, and not acceptable ranges, the ranges 
were fabricated in three different ways; by using extreme values, trimrned 
extremes, and deviations fixra the mean preferred valiaes cterived fran the first 
two questions on the survey. 

ThB passing point resulting frcci application of the Beuk model was equivalent to 
66 percent correct. Passing points resulting froci the three modifications of the 
Hofstee method were: 67 percent correct using extreme values, 66 percent correct 
using trimmed extremes, and 64 j^rcent correct using deviations frm mean values. 

Finally, the preceding results were ccaipared to the actual standard. The 
standard setting coramittee had used a modified Angof f tedinique to set a passing 
score equivalent to 56 percent correct. For the 1984 test administration, this 
standard resulted in a passing rate of 87.96 percent. 



Table 1 

Ocnparison of Standard Setting Ifethods 



Percent Passing 
Basis for pass point Oorrect Rate ' 



Actual standard e^lied in 1984 

Modified Angof f 56 88 

Survey results 

What percentage should pass? 64 70 

What percent correct is encu^? 71 46 

Given data, v*iat is the pass point? 72 42 

Hofstee method (using extremes) 67 61 

Beuk method 66 63 



Note: Italicized numbers are estimated using 1984 score distribution 
(e.g. , a passing point equivalent to 71 percent correct would have 
resulted in passing approximately 46 percent of the examinees) . 



Table 1 shews that applying the results of the surv^ (using any of the methods) 
to the 1984 score distribution would have resulted in a hi^ier standard, and 
consequently, a Icwer passing rate. Assuring that 70.36 percent would pass, as 
indicated by question one, would have set a standard near 64 percent correct. On 
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the other hard, setting the stancfeand at 71.28 percent correct (question two) 
would have resulted in a passing rate near 46 percent. Applying the mean number 
correct (53.81) for the third question to the 1984 distriixition would have 
resulted in a passing rate near 42 pendent. Finally, e^iplying the nodified 
cccprotdse rodel results would have provided pass rates between 61 and 70 
percent. Regardless of method, the consensus of judges responding to this survey 
was considerably more harsh than that of the standard settiijg ccranittee. 

Discussion 

Die major findings of this study inclxade: 

1) the relatively hi^ier standard that would have been set by a peer grci?) 
cccpared to that set by a ccciraittee, 

2) an une35)ected corresponaence between the respwdents' jiidgments regarxlLng 
percent correct and percent passing, and 

3) documentation of an application of ccraprcwdse methods to achieve a 
ccq p ixinise among certificants. 

With actual data, as the percent correct is increased, more examinees fail to 
exceed the standard. Surprisingly, viiile a negative correlation would be 
expected between questions one and two, a positive one was found. It could be 
that many respondents were considering an ideal situation in viiidh many candi- 
dates pass and also achieve hi^ scores. Anecdotically, it is kxKwn that stan- 
dard setting coKmittees have similar illusions. Ifcwever, alternative explana- 
tions for the inconsistency may be the lack of knowledge of the respondents 
regarding the xasual relationship between the two variables, or misperx::eptions 
regarding the overall difficulty of the examination. 

Ihe correlation of .49 between percent correct and number correct (questions 2 
and 3) was not surprising. Ihis may be an indication that a coaiplete data 
set is displayed, respondents display the inclination to apply absoliite 
stai^dards. This interpretation would be consistent with the refusal of ei^t 
respondents to a?3ply a relative standard. 

Several factors could have contributed to the harshness of the respondents and 
have inplications for the limitations of this study. Hhe tendency of raters to 
be harsh while ^setting absolute standards is not a new phenoroenon (see, for 
exaitple, Sdioon, Guillion and Ferrara, 1979) . This tendency may have been rein- 
forced because the individual respondent had already exceeded the standard. In 
fact, a useful replication of this study mi^t inclxxie a mechanism for identify- 
ing the respondent's test score. It could be that respcHidents were applyirg a 
standard in sane way relative to their cwn performance, for exairple, just below 
their score. Bie absolute standard set by the survey respondents was approxi- 
mately equal to the mean test performance. 

It could also be that the relatively lew response rate provided a biased saitple. 
The survey was short, kwt required the respondent to use analytical skills, vAiich 
may have discouraged a portion of the saitple. Identification of individual test 
scores or other characteristics of the respondents could provide an indication of 
whether or not a response bias existed. While the response rate was disappoint- 
ing, those who did respond may have more closely matched Livingston and Zieky's 
(1982) criteria for judges; that is, the judgmsnts of respondents evidently had 
greater personal meaning than the judgments of the non-respondents. 



Another potential source of response bias relates to the timing of the survey. 



As discsjssed previcusly, the peroepticns of the difficulty of the exandrjation may 
not have been aocurate; the respondents took the examination z^prcodmately ei^t 
months pre\dcwsly, and received their results aq^^jrxadmately five months before 
receiving the survey. For varicws reasons, it seemed inaFpropriate to distribute 
the survey at any time other than well after examinees had received their score 
r^rts. Future studies could be designed to seek jixagroents at other times, such 
as before or Immediately following test administratioi. 

Finally, the actual standard (equivalent to 56 percent correct) is relatively 
Icwer than that applied to other ANA certification examinaticTiS, and lower than 
other certification examinations, as well. It could be that judaments of 
oartificants in other areas would differ. Future research may be directed toward 
addressing sccie of these factors, which may further the generalizability of the 
results. 

It could be argued that delegating standard setting to examinees or certif icants 
is an approach consistent with the purpose of seme certification programs because 
it accounts for the purpose of the test, that is, professional recognition. 
Considerable risk could be involved if such a policy were iitplemented. However, 
examinee judgments regarding standards could be useful for setting, adjusting, 
and defending standards without drastically altering the traditional apprxaach to 
identifying a passing point. After data have been collected fron a representa- 
tive candidate group, official boards could find the data useful in identifyir»g a 
standard, or validating an existing standard. Oonsiderations such as the timing 
of the data collection and sampling procedures may be useful in formulating 
future research. 
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